Open access at MIT and Harvard

MIT now has recently adopted an open access policy, which means that all MIT faculty grant MIT the right to make their scholarly publications available on an open access basis, with the possibility of a waiver on an individual case-by-case basis.

Story via John Baez. This follows a similar initiative at Harvard. Both announcements are non-retrospective.

Site problems

As many have noticed we had a few problems with the site yesterday. Thanks to everyone who emailed to let me know they were experiencing difficulties.

We think the problem has been solved, and that the site should be stable now.

Swift: making web applications secure by construction

Swift is a language-based approach to building web applications that are secure by construction. Swift applications are written in the Jif language, a Java-based language that incorporates "security-typing" to manage the flow of information within an application. The Swift compiler automatically partitions the application code into a client-side JavaScript application and a server-side Java application, with code placement constrained by declarative information flow policies that strongly enforce the confidentiality and integrity of server-side information.

Swift was recently featured in the "Research Highlights" section of the Communications of the ACM, as a condensed version of an earlier conference paper. The original conference paper is Stephen Chong, Jed Liu, Andrew C. Myers, Xin Qi, K. Vikram, Lantian Zheng, and Xin Zheng, Secure web applications via automatic partitioning, Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), pages 31–44, October 2007.

Jif has been mentioned previously on LtU here and here.

Marrying VMs

VMKit is an LLVM project; per the announcement at the Proceedings of the 2008 LLVM Developers' Meeting:

VMKit is an implementation of the Java and .NET Virtual Machines that use LLVM to optimize and JIT compile the code. This talk [slides, video] describes how VMKit integrates components from various systems, how bytecode translation works, describes the current performance status of the system, and discusses areas for future extension.

Lawvere Theories and Monads

Martin Hyland and John Power (2007). The Category Theoretic Understanding of Universal Algebra: Lawvere Theories and Monads. ENTCS 172:437-458.

Both monads and Lawvere theories provide characterisations of algebraic structure, with monads providing the more general characterisation. The authors provide an introduction to Lawvere theories, discusses their relationship to sets, and why monads became the more popular treatment.

Then they tackle the application of the theory to the semantics of side effects, where they argue that the generality of monads allow them to characterise computational phenomena that are not to do with side effects such as partiality and continuations, and argue that Lawvere theories more cleanly characterise what side effects are.

This paper is a good introduction to an important line of recent research done by Hyland&Power; cf. also the LtU story Combining computational effects.

A New Approach to the Functional Design of a Digital Computer

A New Approach to the Functional Design of a Digital Computer by R. S. Barton, 1961.

The present methods of determining the functional design of computers are critically reviewed and a new approach proposed. This is illustrated by explaining, in abstracted form, part of the control organization of a new and different machine based, in part, on the ALGOL 60 language. The concepts of expression and procedure lead directly to the use of a Polish string program. A new arrangement of control registers results, which provides for automatic allocation of temporary storage within expressions and procedures, and a generalized subroutine linkage.

The simplicity and power of these notions suggests that there is much room for improvement in present machines and that more attention should be given to control functions new designs.

"One of the most amazing far reaching 4 page papers in our field" referenced in A Conversation with Alan Kay.

A Tiny Computer

A Tiny Computer. An unpublished memo by Chuck Thacker, Microsoft Research, 3 September 2007. Posted with permission.

Alan Kay recently posed the following problem:

"I'd like to show JHS and HS kids 'the simplest non-tricky architecture' in which simple gates and flipflops manifest a programmable computer”.

Alan posed a couple of other desiderata, primarily that the computer needs to demonstrate fundamental principles, but should be capable of running real programs produced by a compiler. This introduces some tension into the design, since simplicity and performance sometimes are in conflict.

This sounded like an interesting challenge, and I have a proposed design.

Presents the design of a complete CPU in under two pages of FPGA-ready Verilog. The TC3 is a Harvard architecture 32-bit RISC with 1KB of instruction memory, 1KB of data memory, and 128 general-purpose registers. This design is an ancestor of the DDR2 DRAM Controller used in the BEE3.

To help us into the brave new world of Hardware/Software co-design!

Barbara Liskov Wins Turing Award

News flash: Barbara Liskov Wins Turing Award. The full citation:

Barbara Liskov has led important developments in computing by creating and implementing programming languages, operating systems, and innovative systems designs that have advanced the state of the art of data abstraction, modularity, fault tolerance, persistence, and distributed computing systems.

The Venus operating system was an early example of principled operating system design. The CLU programming language was one of the earliest and most complete programming languages based on modules formed from abstract data types and incorporating unique intertwining of both early and late binding mechanisms. ARGUS extended many of the CLU ideas to distributed programming, and incorporated the first versions of nested transactions to maintain predictable consistencies. Other advances include solutions elegantly combining theory and pragmatics in the areas of decentralized information flow, replicated storage and caching of persistent objects, and modular upgrading of distributed systems. Her contributions have been incorporated into the practice of programming, thereby influencing many of the most important systems used today: for programming, specification, systems design, and distributed architectures.

Here is a DDJ interview, in which Liskov mentions CLU and data abstraction as the accomplishment she is most proud of.

And here are searches of the LtU archives for Liskov and CLU.

Languages and security: a short reading list

Ivan Krstić, former director of One Laptop per Child and all around computer security guru, has a few humorous thoughts on the current intersection between security and programming language design in Languages and security: a short reading list.

If I had to grossly overgeneralize, I’d say people looking at language security fall in roughly three schools of thought:

1. The "My name is Correctness, king of kings" people say that security problems are merely one manifestation of incorrectness, which is dissonance between what the program is supposed to do and what its implementation actually does. This tends to be the group led by mathematicians, and you can recognize them because their solutions revolve around proofs and the writing and (automatic) verification thereof.

2. The "If you don’t use a bazooka, you can’t blow things up" people say that security problems are a byproduct of exposing insufficiently intelligent or well-trained programmers to dangerous language features that don’t come with a safety interlock. You can identify these guys because they tend to make new languages that no one uses, and frequently describe them as "like popular language X but safer".

3. The "We need to change how we fundamentally build software" people say that security problems are the result of having insufficiently fine-grained methods for delegating individual bits of authority to individual parts of a running program, which traditionally results in all parts of a program having all the authority, which means the attack surface becomes a Cartesian product of every part of the program and every bit of authority which the program uses. You can spot these guys because they tend to throw around the phrase "object-capability model".

Now, while I'm already grossly overgeneralizing, I think the first group is almost useless, the second group is almost irrelevant, and the third group is absolutely horrible at explaining what the hell they’re talking about.

Tongue in cheek? Absolutely, but probably not that far off when it comes to the languages that the mainstream uses today (except, arguably, for the quibble that #2 is applied to some extent in all of the most popular "managed" language runtimes).

As the name of article suggests it has some good links for further study into current lines of research.

Which directions are likely to be the most fruitful in the coming years? And what other directions are being missed?

Linear Logic and Permutation Stacks--The Forth Shall Be First

Linear Logic and Permutation Stacks--The Forth Shall Be First by Henry Baker, 1993.

Girard's linear logic can be used to model programming languages in which each bound variable name has exactly one "occurrence"--i.e., no variable can have implicit "fan-out"; multiple uses require explicit duplication. Among other nice properties, "linear" languages need no garbage collector, yet have no dangling reference problems. We show a natural equivalence between a "linear" programming language and a stack machine in which the top items can undergo arbitrary permutations. Such permutation stack machines can be considered combinator abstractions of Moore's Forth programming language.

I remembered this paper while chatting with a friend who's designing a stack-based instruction set and looking for relevant compilation techniques (imagine compiling C to Forth). Do you have some relevant references?

Today I found this paragraph particularly intriguing:

Since Forth is usually implemented on a traditional von Neumann machine, one thinks of the return stack as holding "return addresses". However, in these days of large instruction caches, in which entire cache lines are read from the main memory in one transaction, this view should be updated. It is well-known that non-scientific programs have a very high rate of conditional branches, with the mean number of instructions between branches being on the order of 10 or less. Forth programs are also very short, with "straight-line" (non-branching) sequences averaging 10 items or less. In these environments, it makes more sense to view the return stack itself as the instruction buffer cache! In other words, the return stack doesn't hold "return addresses" at all, but the instructions themselves! When a routine is entered, the entire routine is dumped onto the top of the return stack, and execution proceeds with the top item of this stack. Since routines are generally very short, the transfer of an entire routine is about the same amount of work as transferring a complete cache line in present architectures. Furthermore, an instruction stack-cache-buffer is normally accessed sequentially, and therefore can be implemented using shift register technology. Since a shift register can be shifted faster than a RAM can be accessed, the "access time" of this instruction stack-cache-buffer will not be a limiting factor in a machine's speed. Executing a loop in an instruction stack-cache-buffer is essentially the making of connections necessary to create a cyclic shift register which literally cycles the instructions of the loop around the cyclic shift register.

Imagine that!